13 research outputs found

    Energy-efficient hardware design based on high-level synthesis

    Get PDF
    This dissertation describes research activities broadly concerning the area of High-level synthesis (HLS), but more specifically, regarding the HLS-based design of energy-efficient hardware (HW) accelerators. HW accelerators, mostly implemented on FPGAs, are integral to the heterogeneous architectures employed in modern high performance computing (HPC) systems due to their ability to speed up the execution while dramatically reducing the energy consumption of computationally challenging portions of complex applications. Hence, the first activity was regarding an HLS-based approach to directly execute an OpenCL code on an FPGA instead of its traditional GPU-based counterpart. Modern FPGAs offer considerable computational capabilities while consuming significantly smaller power as compared to high-end GPUs. Several different implementations of the K-Nearest Neighbor algorithm were considered on both FPGA- and GPU-based platforms and their performance was compared. FPGAs were generally more energy-efficient than the GPUs in all the test cases. Eventually, we were also able to get a faster (in terms of execution time) FPGA implementation by using an FPGA-specific OpenCL coding style and utilizing suitable HLS directives. The second activity was targeted towards the development of a methodology complementing HLS to automatically derive power optimization directives (also known as "power intent") from a system-level design description and use it to drive the design steps after HLS, by producing a directive file written using the common power format (CPF) to achieve power shut-off (PSO) in case of an ASIC design. The proposed LP-HLS methodology reduces the design effort by enabling designers to infer low power information from the system-level description of a design rather than at the RTL. This methodology required a SystemC description of a generic power management module to describe the design context of a HW module also modeled in SystemC, along with the development of a tool to automatically produce the CPF file to accomplish PSO. Several test cases were considered to validate the proposed methodology and the results demonstrated its ability to correctly extract the low power information and apply it to achieve power optimization in the backend flow

    High Performance and Low Power Monte Carlo Methods to Option Pricing Models via High Level Design and Synthesis

    Get PDF
    This article compares the performance and energy consumption of GPUs and FPGAs via implementing financial market models. The case studies used in this comparison are the Black-Scholes model and the Heston model for option pricing problems, which are analyzed numerically by Monte Carlo method. The algorithms are computationally intensive but not memory-intensive and thus well suited for FPGA implementation. High-level synthesis was performed starting from parallel models written in OpenCL and then various micro-architectures were explored and optimized on FPGAs. The final implementations of both models to several options on FPGAs achieved the best parallel acceleration systems, in terms of both performance-per-operation and energy-per-operation, compared not only to the kernels on advanced GPUs but also to the RTL implementations found in the literatures

    PERFORMANCE EVALUATION OF A MULTICARRIER MIMO SYSTEM BASED ON DFT-PRECODING AND SUBCARRIER MAPPING

    Get PDF
    The ever-increasing end user demands are instigating the development of innovative methods targeting not only data rate enhancement but additionally better service quality in each subsequent wireless communication standard. This quest to achieve higher data rates has compelled the next generation communication technologies to use multicarrier systems e.g. orthogonal frequency division multiplexing (OFDM), while also relying on the multiple-input multiple-output (MIMO) technology. This paper is focused on implementing a MIMO-OFDM system and on using various techniques to optimize it in terms of the bit-error rate performance. The test case considered is a system implementation constituting the enabling technologies for 4G and beyond communication systems. The bit-error rate optimizations considered are based on preceding the OFDM modulation step by Discrete Fourier Transform (DFT) while also considering various subcarrier mapping schemes. MATLAB-based simulation of a 2 Ɨ 2 MIMO-OFDM system exhibits a maximum of 2 to 5 orders of magnitude reduction in bit-error rate due to DFT-precoding and subcarrier mapping respectively at high signal-to-noise ratio values in various environments. A 2-3dBs reduction in peak-to-average power ratio due to DFT-precoding in different environments is also exhibited in the various simulations

    VR-ZYCAP: A versatile resourse-level ICAP controller for ZYNQ SOC

    Get PDF
    This article belongs to the Special Issue Architecture and CAD for Field-Programmable Gate Arrays (FPGAs)Hybrid architectures integrating a processor with an SRAM-based FPGA fabricā€”for example, Xilinx ZynQ SoCā€”are increasingly being used as a single-chip solution in several market segments to replace multi-chip designs. These devices not only provide advantages in terms of logic density, cost and integration, but also provide run-time in-field reconfiguration capabilities. However, the current reconfiguration capabilities provided by vendor tools are limited to the module level. Therefore, incremental run-time configuration memory changes require a lengthy compilation time for off-line bitstream generation along with storage and reconfiguration time overheads with traditional vendor methodologies. In this paper, an internal configuration access port (ICAP) controller that provides a versatile fine-grain resource-level incremental reconfiguration of the programmable logic (PL) resources in ZynQ SoC is presented. The proposed controller implemented in PL, called VR-ZyCAP, can reconfigure look-up tables (LUTs) and Flip-Flops (FF). The run-time reconfiguration of FF is achieved through a reset after reconfiguration (RAR)-featured partial bitstream to avoid the unintended state corruption of other memory elements. Along with versatility, our proposed controller improves the reconfiguration time by 30 times for FFs compared to state-of-the-art works while achieving a nearly 400-fold increase in speed for LUTs when compared to vendor-supported software approaches. In addition, it achieves competitive resource utilization when compared to existing approaches.This research was funded by Spanish Ministry of Science and Innovation under the ACHILLES project, grant number PID2019-104207RB-I00 and by Taif University Researchers Supporting fund, grant number (TURSP-2020/144), Taif University, Taif, Saudi Arabia

    Efficient FPGA Implementation of OpenCL High-Performance Computing Applications via High-Level Synthesis

    Get PDF
    FPGA-based accelerators have recently evolved as strong competitors to the traditional GPU-based accelerators in modern high-performance computing systems. They offer both high computational capabilities and considerably lower energy consumption. High-level synthesis (HLS) can be used to overcome the main hurdle in the mainstream usage of the FPGA-based accelerators, i.e., the complexity of their design flow. HLS enables the designers to program an FPGA directly by using high-level languages, e.g., C, C++, SystemC, and OpenCL. This paper presents an HLS-based FPGA implementation of several algorithms from a variety of application domains. A performance comparison in terms of execution time, energy, and power consumption with some high-end GPUs is performed as well. The algorithms have been modeled in OpenCL for both GPU and FPGA implementation. We conclude that FPGAs are much more energy-efficient than GPUs in all the test cases that we considered. Moreover, FPGAs can sometimes be faster than GPUs by using an FPGA-specific OpenCL programming style and utilizing a variety of appropriate HLS directives
    corecore